conversion rate
Beyond Differences: Doubly Robust Meta-Learners for Ratio-Based Treatment Effects
Fuchs, Michael, Kreiss, Dominik
When treatment effects are naturally expressed as ratios -- as in medicine, pricing, and marketing -- the ratio-based CATE $ฯ(x) = E[Y|W=1,X=x] / E[Y|W=0,X=x]$ is the appropriate estimand. Yet existing estimators either impose a log-linear parametric structure or apply generic regression without robustness guarantees for this functional. We introduce the Q-Learner, which decomposes $ฯ(x)$ into a product of two odds ratios, reducing ratio-CATE estimation for binary outcomes to two propensity classification tasks. We further derive doubly robust augmentations for both S/T- and Q-style ratio learners and characterize their distinct robustness properties. In benchmarks on seven RCT datasets, the Q-Learner is the most consistently competitive method in low-conversion regimes, where its propensity-only construction sidesteps the imbalanced regression that hurts outcome-based estimators. On four observational datasets, where propensity must be estimated and confounding cannot be ruled out, the DR learners introduced here decisively come out on top, making them practitioners' natural default for confounded observational data.
GeneralizedDelayedFeedbackModel withPost-Click InformationinRecommenderSystems
However,accurate conversion labels arerevealed after along delay,which harms the timeliness ofrecommender systems. Previousliterature concentrates onutilizing early conversions to mitigate such a delayed feedback problem. In this paper, we show that post-click user behaviors are also informative to conversion rate prediction and can beused toimprovetimeliness.
Sponsored Questions and How to Auction Them
Bhawalkar, Kshipra, Psomas, Alexandros, Wang, Di
Online platforms connect users with relevant products and services using ads. A key challenge is that a user's search query often leaves their true intent ambiguous. Typically, platforms passively predict relevance based on available signals and in some cases offer query refinements. The shift from traditional search to conversational AI provides a new approach. When a user's query is ambiguous, a Large Language Model (LLM) can proactively offer several clarifying follow-up prompts. In this paper we consider the following: what if some of these follow-up prompts can be ``sponsored,'' i.e., selected for their advertising potential. How should these ``suggestion slots'' be allocated? And, how does this new mechanism interact with the traditional ad auction that might follow? This paper introduces a formal model for designing and analyzing these interactive platforms. We use this model to investigate a critical engineering choice: whether it is better to build an end-to-end pipeline that jointly optimizes the user interaction and the final ad auction, or to decouple them into separate mechanisms for the suggestion slots and another for the subsequent ad slot. We show that the VCG mechanism can be adopted to jointly optimize the sponsored suggestion and the ads that follow; while this mechanism is more complex, it achieves outcomes that are efficient and truthful. On the other hand, we prove that the simple-to-implement modular approach suffers from strategic inefficiency: its Price of Anarchy is unbounded.
Enhancing Talent Search Ranking with Role-Aware Expert Mixtures and LLM-based Fine-Grained Job Descriptions
Li, Jihang, Xu, Bing, Chen, Zulong, Xu, Chuanfei, Chen, Minping, Liu, Suyu, Zhou, Ying, Wen, Zeyi
Talent search is a cornerstone of modern recruitment systems, yet existing approaches often struggle to capture nuanced job-specific preferences, model recruiter behavior at a fine-grained level, and mitigate noise from subjective human judgments. We present a novel framework that enhances talent search effectiveness and delivers substantial business value through two key innovations: (i) leveraging LLMs to extract fine-grained recruitment signals from job descriptions and historical hiring data, and (ii) employing a role-aware multi-gate MoE network to capture behavioral differences across recruiter roles. To further reduce noise, we introduce a multi-task learning module that jointly optimizes click-through rate (CTR), conversion rate (CVR), and resume matching relevance. Experiments on real-world recruitment data and online A/B testing show relative AUC gains of 1.70% (CTR) and 5.97% (CVR), and a 17.29% lift in click-through conversion rate. These improvements reduce dependence on external sourcing channels, enabling an estimated annual cost saving of millions of CNY.
Generative AI and Firm Productivity: Field Experiments in Online Retail
Fang, Lu, Yuan, Zhe, Zhang, Kaifu, Donati, Dante, Sarvary, Miklos
We quantify the impact of Generative Artificial Intelligence (GenAI) on firm productivity through a series of large-scale randomized field experiments involving millions of users and products at a leading cross-border online retail platform. Over six months in 2023-2024, GenAI-based enhancements were integrated into seven consumer-facing business workflows. We find that GenAI adoption significantly increases sales, with treatment effects ranging from $0\%$ to $16.3\%$, depending on GenAI's marginal contribution relative to existing firm practices. Because inputs and prices were held constant across experimental arms, these gains map directly into total factor productivity improvements. Across the four GenAI applications with positive effects, the implied annual incremental value is approximately $\$ 5$ per consumer-an economically meaningful impact given the retailer's scale and the early stage of GenAI adoption. The primary mechanism operates through higher conversion rates, consistent with GenAI reducing frictions in the marketplace and improving consumer experience. We also document substantial heterogeneity: smaller and newer sellers, as well as less experienced consumers, exhibit disproportionately larger gains. Our findings provide novel, large-scale causal evidence on the productivity effects of GenAI in online retail, highlighting both its immediate value and broader potential.
Personalized Auto-Grading and Feedback System for Constructive Geometry Tasks Using Large Language Models on an Online Math Platform
Lee, Yong Oh, Bang, Byeonghun, Lee, Joohyun, Oh, Sejun
As personalized learning gains increasing attention in mathematics education, there is a growing demand for intelligent systems that can assess complex student responses and provide individualized feedback in real time. In this study, we present a personalized auto-grading and feedback system for constructive geometry tasks, developed using large language models (LLMs) and deployed on the Algeomath platform, a Korean online tool designed for interactive geometric constructions. The proposed system evaluates student-submitted geometric constructions by analyzing their procedural accuracy and conceptual understanding. It employs a prompt-based grading mechanism using GPT-4, where student answers and model solutions are compared through a few-shot learning approach. Feedback is generated based on teacher-authored examples built from anticipated student responses, and it dynamically adapts to the student's problem-solving history, allowing up to four iterative attempts per question. The system was piloted with 79 middle-school students, where LLM-generated grades and feedback were benchmarked against teacher judgments. Grading closely aligned with teachers, and feedback helped many students revise errors and complete multi-step geometry tasks. While short-term corrections were frequent, longer-term transfer effects were less clear. Overall, the study highlights the potential of LLMs to support scalable, teacher-aligned formative assessment in mathematics, while pointing to improvements needed in terminology handling and feedback design.
Profit over Proxies: A Scalable Bayesian Decision Framework for Optimizing Multi-Variant Online Experiments
Pillai, Srijesh, Chandrawat, Rajesh Kumar
Online controlled experiments (A/B tests) are fundamental to data - driven decision - making in the digital economy. However, their real - world application is frequently compromised by two critical shortcomings: the use of statistically flawed heuristics like " p - value peeking", which inflates false positive rates, and an over - reliance on proxy metrics like conversion rates, which can lead to decisions that inadvertently harm core business profitability. This paper addresses these challenges by introducing a comp rehensive and scalable Bayesian decision framework designed for profit optimization in multi - variant (A/B/n) experiments. We propose a hierarchical Bayesian model that simultaneously estimates the probability of conversion (using a Beta - Bernoulli model) and the monetary value of that conversion (using a robust Bayesian model for the mean transaction value). Building on this, we employ a decision - theoretic stopping rule based on Expected Loss, enabling experiments to be concluded not only when a superior variant is identified but also when it becomes clear that no variant offers a practically significant improvement (stopping f or futility). The framework successfully navigates "revenue traps" where a variant with a higher conversion rate would have resulted in a net financial loss, correctly terminates futile experiments early to conserve resources, and maintains strict statisti cal integrity throughout the monitoring process. Ultimately, this work provides a practical and principled methodology for organizations to move beyond simple A/B testing towards a mature, profit - driven experimentation culture, ensuring that statistical conclusions translate directly to strategic busines s value.
Personalized Recommendation of Dish and Restaurant Collections on iFood
Granado, Fernando F., Bezerra, Davi A., Queiroz, Iuri, Oliveira, Nathan, Fernandes, Pedro, Schock, Bruno
Food delivery platforms face the challenge of helping users navigate vast catalogs of restaurants and dishes to find meals they truly enjoy. This paper presents RED, an automated recommendation system designed for iFood, Latin America's largest on-demand food delivery platform, to personalize the selection of curated food collections displayed to millions of users. Our approach employs a LightGBM classifier that scores collections based on three feature groups: collection characteristics, user-collection similarity, and contextual information. To address the cold-start problem of recommending newly created collections, we develop content-based representations using item embeddings and implement monotonicity constraints to improve generalization. We tackle data scarcity by bootstrapping from category carousel interactions and address visibility bias through unbiased sampling of impressions and purchases in production. The system demonstrates significant real-world impact through extensive A/B testing with 5-10% of iFood's user base. Online results of our A/B tests add up to 97% improvement in Card Conversion Rate and 1.4% increase in overall App Conversion Rate compared to popularity-based baselines. Notably, our offline accuracy metrics strongly correlate with online performance, enabling reliable impact prediction before deployment. To our knowledge, this is the first work to detail large-scale recommendation of curated food collections in a dynamic commercial environment.
Amazon's AI wants to own online shopping data
The two-part special, 'The Amazon Review Killer,' is now streaming on Fox Nation. Amazon already dominates online shopping, but now it's setting its sights even higher. With a new artificial intelligence-powered project called Starfish, the company aims to become the world's most complete and trusted source of product information. The goal? Make every listing on Amazon accurate, detailed and easy to understand, whether the product is sold by Amazon or a third-party seller. If the project works as planned, it could save sellers hours of work and help shoppers find what they need faster.